Generalization of the Ewens sampling formula to arbitrary fitness landscapes

نویسندگان

  • Pavel Khromov
  • Constantin D Malliaris
  • Alexandre V Morozov
چکیده

In considering evolution of transcribed regions, regulatory sequences, and other genomic loci, we are often faced with a situation in which the number of allelic states greatly exceeds the size of the population. In this limit, the population eventually adopts a steady state characterized by mutation-selection-drift balance. Although new alleles continue to be explored through mutation, the statistics of the population, and in particular the probabilities of seeing specific allelic configurations in samples taken from the population, do not change with time. In the absence of selection, the probabilities of allelic configurations are given by the Ewens sampling formula, widely used in population genetics to detect deviations from neutrality. Here we develop an extension of this formula to arbitrary fitness distributions. Although our approach is general, we focus on the class of fitness landscapes, inspired by recent high-throughput genotype-phenotype maps, in which alleles can be in several distinct phenotypic states. This class of landscapes yields sampling probabilities that are computationally more tractable and can form a basis for inference of selection signatures from genomic data. Using an efficient numerical implementation of the sampling probabilities, we demonstrate that, for a sizable range of mutation rates and selection coefficients, the steady-state allelic diversity is not neutral. Therefore, it may be used to infer selection coefficients, as well as other evolutionary parameters from population data. We also carry out numerical simulations to challenge various approximations involved in deriving our sampling formulas, such as the infinite-allele limit and the "full connectivity" assumption inherent in the Ewens theory, in which each allele can mutate into any other allele. We find that, at least for the specific numerical examples studied, our theory remains sufficiently accurate even if these assumptions are relaxed. Thus our framework establishes both theoretical and practical foundations for inferring selection signatures from population-level genomic sequence samples.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Asymptotic Sampling Formula for the Coalescent with Recombination By

Ewens sampling formula (ESF) is a one-parameter family of probability distributions with a number of intriguing combinatorial connections. This elegant closed-form formula first arose in biology as the stationary probability distribution of a sample configuration at one locus under the infinitealleles model of mutation. Since its discovery in the early 1970s, the ESF has been used in various bi...

متن کامل

On Eigen's Quasispecies Model, Two-Valued Fitness Landscapes, and Isometry Groups Acting on Finite Metric Spaces.

A two-valued fitness landscape is introduced for the classical Eigen's quasispecies model. This fitness landscape can be considered as a direct generalization of the so-called single- or sharply peaked landscape. A general, non-permutation invariant quasispecies model is studied, and therefore the dimension of the problem is [Formula: see text], where N is the sequence length. It is shown that ...

متن کامل

An Asymptotic Sampling Formula for the Coalescent with Recombination.

Ewens sampling formula (ESF) is a one-parameter family of probability distributions with a number of intriguing combinatorial connections. This elegant closed-form formula first arose in biology as the stationary probability distribution of a sample configuration at one locus under the infinite-alleles model of mutation. Since its discovery in the early 1970s, the ESF has been used in various b...

متن کامل

ar X iv : 0 70 7 . 16 06 v 1 [ m at h . PR ] 1 1 Ju l 2 00 7 Exchangeable partitions derived from Markovian coalescents with simultaneous multiple collisions ∗

Kingman derived the Ewens sampling formula for random partitions from the genealogy model defined by a Poisson process of mutations along lines of descent governed by a simple coalescent process. Möhle described the recursion which determines the generalization of the Ewens sampling formula when the lines of descent are governed by a coalescent with multiple collisions. In [7] authors exploit a...

متن کامل

The structure of allelic diversity in the presence of purifying selection.

In the absence of selection, the structure of equilibrium allelic diversity is described by the elegant sampling formula of Ewens. This formula has helped to shape our expectations of empirical patterns of molecular variation. Along with coalescent theory, it provides statistical techniques for rejecting the null model of neutrality. However, we still do not fully understand the statistics of t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 13  شماره 

صفحات  -

تاریخ انتشار 2018